Search CORE

33 research outputs found

People on Drugs: Credibility of User Statements in Health Communities

Author: Danescu-Niculescu-Mizil C.
Mukherjee S.
Weikum G.
Publication venue
Publication date: 01/01/2017
Field of study

Online health communities are a valuable source of information for patients and physicians. However, such user-generated resources are often plagued by inaccuracies and misinformation. In this work we propose a method for automatically establishing the credibility of user-generated medical statements and the trustworthiness of their authors by exploiting linguistic cues and distant supervision from expert sources. To this end we introduce a probabilistic graphical model that jointly learns user trustworthiness, statement credibility, and language objectivity. We apply this methodology to the task of extracting rare or unknown side-effects of medical drugs --- this being one of the problems where large scale non-expert data has the potential to complement expert medical knowledge. We show that our method can reliably extract side-effects and filter out false statements, while identifying trustworthy users that are likely to contribute valuable medical information

MPG.PuRe

Representatively Memorable: Sampling the Right Phrase Set to Get the Text Entry Experiment Right

Author: Danescu-Niculescu-Mizil C.
Keller F.
Nelder J. A.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/05/2014
Field of study

[EN] In text entry experiments, memorability is a desired property of the phrases used as stimuli. Unfortunately, to date there is no automated method to achieve this effect. As a result, researchers have to use either manually curated Englishonly phrase sets or sampling procedures that do not guarantee phrases being memorable. In response to this need, we present a novel sampling method based on two core ideas: a multiple regression model over language-independent features, and the statistical analysis of the corpus from which phrases will be drawn. Our results show that researchers can finally use a method to successfully curate their own stimuli targeting potentially any language or domain. The source code as well as our phrase sets are publicly available.This work is supported by the 7th Framework Program of the European Commision (FP7/2007-13) under grant agreements 287576 (CASMACAT) and 600707 (tranScriptorium)Leiva, LA.; Sanchis-Trilles, G. (2014). Representatively Memorable: Sampling the Right Phrase Set to Get the Text Entry Experiment Right. ACM. 1709-1712. https://doi.org/10.1145/2556288.2557024S1709171

Crossref

RiuNet

Cascades: A view from Audience

Author: Bakshy E.
Berger J.
Danescu-Niculescu-Mizil C.
Dow P. A.
Goel S.
Goel S.
Lin Y.-R. R.
Romero D. M.
Ross S. M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 26/03/2017
Field of study

Cascades on online networks have been a popular subject of study in the past decade, and there is a considerable literature on phenomena such as diffusion mechanisms, virality, cascade prediction, and peer network effects. However, a basic question has received comparatively little attention: how desirable are cascades on a social media platform from the point of view of users? While versions of this question have been considered from the perspective of the producers of cascades, any answer to this question must also take into account the effect of cascades on their audience. In this work, we seek to fill this gap by providing a consumer perspective of cascade. Users on online networks play the dual role of producers and consumers. First, we perform an empirical study of the interaction of Twitter users with retweet cascades. We measure how often users observe retweets in their home timeline, and observe a phenomenon that we term the "Impressions Paradox": the share of impressions for cascades of size k decays much slower than frequency of cascades of size k. Thus, the audience for cascades can be quite large even for rare large cascades. We also measure audience engagement with retweet cascades in comparison to non-retweeted content. Our results show that cascades often rival or exceed organic content in engagement received per impression. This result is perhaps surprising in that consumers didn't opt in to see tweets from these authors. Furthermore, although cascading content is widely popular, one would expect it to eventually reach parts of the audience that may not be interested in the content. Motivated by our findings, we posit a theoretical model that focuses on the effect of cascades on the audience. Our results on this model highlight the balance between retweeting as a high-quality content selection mechanism and the role of network users in filtering irrelevant content

arXiv.org e-Print Archive

Crossref

All Who Wander: On the Prevalence and Characteristics of Multi-community Engagement

Author: Argamon S.
Berry J. W.
Cheng J.
Danescu-Niculescu-Mizil C.
Erikson E. H.
Kraut R. E.
Lakkaraju H.
M. E. Shaw. Group Dynamics
Simpson E. H.
Yang J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 16/03/2015
Field of study

Although analyzing user behavior within individual communities is an active and rich research domain, people usually interact with multiple communities both on- and off-line. How do users act in such multi-community environments? Although there are a host of intriguing aspects to this question, it has received much less attention in the research community in comparison to the intra-community case. In this paper, we examine three aspects of multi-community engagement: the sequence of communities that users post to, the language that users employ in those communities, and the feedback that users receive, using longitudinal posting behavior on Reddit as our main data source, and DBLP for auxiliary experiments. We also demonstrate the effectiveness of features drawn from these aspects in predicting users' future level of activity. One might expect that a user's trajectory mimics the "settling-down" process in real life: an initial exploration of sub-communities before settling down into a few niches. However, we find that the users in our data continually post in new communities; moreover, as time goes on, they post increasingly evenly among a more diverse set of smaller communities. Interestingly, it seems that users that eventually leave the community are "destined" to do so from the very beginning, in the sense of showing significantly different "wandering" patterns very early on in their trajectories; this finding has potentially important design implications for community maintainers. Our multi-community perspective also allows us to investigate the "situation vs. personality" debate from language usage across different communities.Comment: 11 pages, data available at https://chenhaot.com/pages/multi-community.html, Proceedings of WWW 2015 (updated references

arXiv.org e-Print Archive

CiteSeerX

Crossref

Competition and Selection Among Conventions

Author: Adamic L. A.
Bakshy E.
Bendersky M.
Berger J.
Danescu-Niculescu-Mizil C.
Deutschmann P.
Eisenstein J.
Eisenstein J.
Goel S.
Kooti F.
Krackhardt D.
Labov W.
Labov W. L.
Livne A.
Rogers E.
Romero D. M.
Rotabi R.
Tahmasebi N.
Tsur O.
Valente T.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 26/03/2017
Field of study

In many domains, a latent competition among different conventions determines which one will come to dominate. One sees such effects in the success of community jargon, of competing frames in political rhetoric, or of terminology in technical contexts. These effects have become widespread in the online domain, where the data offers the potential to study competition among conventions at a fine-grained level. In analyzing the dynamics of conventions over time, however, even with detailed on-line data, one encounters two significant challenges. First, as conventions evolve, the underlying substance of their meaning tends to change as well; and such substantive changes confound investigations of social effects. Second, the selection of a convention takes place through the complex interactions of individuals within a community, and contention between the users of competing conventions plays a key role in the convention's evolution. Any analysis must take place in the presence of these two issues. In this work we study a setting in which we can cleanly track the competition among conventions. Our analysis is based on the spread of low-level authoring conventions in the eprint arXiv over 24 years: by tracking the spread of macros and other author-defined conventions, we are able to study conventions that vary even as the underlying meaning remains constant. We find that the interaction among co-authors over time plays a crucial role in the selection of them; the distinction between more and less experienced members of the community, and the distinction between conventions with visible versus invisible effects, are both central to the underlying processes. Through our analysis we make predictions at the population level about the ultimate success of different synonymous conventions over time--and at the individual level about the outcome of "fights" between people over convention choices.Comment: To appear in Proceedings of WWW 2017, data at https://github.com/CornellNLP/Macro

arXiv.org e-Print Archive

Crossref